perf: Optimize concat()/concat_ws() UDFs by neilconway · Pull Request #20317 · apache/datafusion

neilconway · 2026-02-12T13:41:01Z

Which issue does this PR close?

Closes Optimize concat() #20316.

Rationale for this change

Faster is better.

What changes are included in this PR?

This commit implements three optimizations:

In StringViewArrayBuilder, we recreated block after every call to append_offset. It is cheaper to instead clear and re-use block.
In StringViewArrayBuilder::write(), we re-validated that a string array consists of valid UTF8 characters. This was unnecessary work and can be skipped.
In the concat() UDF implementation, we miscalculated the initial size of the StringViewArrayBuilder buffer. This didn't lead to incorrect behavior but it resulted in unnecessarily needing to reallocate the buffer.

Are these changes tested?

Yes; no additional test cases warranted.

Are there any user-facing changes?

No.

neilconway · 2026-02-12T13:41:27Z

Benchmark results:

$ cargo bench --bench concat -- --baseline concat-vanilla
   Compiling datafusion-functions v52.1.0 (/Users/neilconway/datafusion/datafusion/functions)
    Finished `bench` profile [optimized] target(s) in 46.20s
     Running benches/concat.rs (target/release/deps/concat-4233805a3f40f888)
Gnuplot not found, using plotters backend
concat function/concat/1024
                        time:   [10.485 µs 10.510 µs 10.544 µs]
                        change: [−5.7512% −5.3786% −4.9576%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

concat function/concat/4096
                        time:   [39.832 µs 39.896 µs 39.957 µs]
                        change: [−6.1783% −5.6993% −5.2265%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

concat function/concat/8192
                        time:   [79.562 µs 79.690 µs 79.831 µs]
                        change: [−5.3363% −5.0808% −4.8152%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

concat function/concat_view/1024
                        time:   [23.159 µs 23.197 µs 23.236 µs]
                        change: [−82.546% −82.420% −82.307%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild

concat function/concat_view/4096
                        time:   [97.326 µs 97.967 µs 98.620 µs]
                        change: [−81.132% −81.027% −80.908%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  7 (7.00%) high mild

concat function/concat_view/8192
                        time:   [206.99 µs 207.77 µs 208.67 µs]
                        change: [−80.382% −80.269% −80.158%] (p = 0.00 < 0.05)
                        Performance has improved.

concat function/concat/scalar
                        time:   [1.0204 µs 1.0226 µs 1.0249 µs]
                        change: [−0.7254% −0.4513% −0.2078%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild

datafusion/functions/src/string/concat.rs

This commit implements three optimizations: * In `StringViewArrayBuilder`, we re-allocated `block` after every call to `append_offset`. It is cheaper to instead clear and re-use `block`. * In `StringViewArrayBuilder::write()`, we re-validated that a string array consists of valid UTF8 characters. This was unnecessary work and can be skipped. * In the concat() UDF implementation, we miscalculated the initial size of the StringViewArrayBuilder buffer. This didn't lead to incorrect behavior but it resulted in unnecessarily needing to reallocate the buffer.

neilconway · 2026-02-15T15:34:06Z

@Jefffrey Is this okay to land in main, do you think? Lmk if you have other feedback or concerns.

martin-g · 2026-02-18T07:55:19Z

datafusion/functions/src/string/concat.rs

                            let string_array = as_string_view_array(array)?;

-                            data_size += string_array.len();
+                            data_size += string_array.total_buffer_bytes_used();


This does not include the inlined strings (shorter than 12 bytes). An array of only short strings will report 0.
Is this a problem ?!

If YES then let's use something like:

self.views().iter().sum().min(usize::MAX) as usize

Fair point.

On your suggested fix: summing the entire 128 bit view seems wrong (we don't want to add the buffer indexes and offsets, etc.). I think it would work to just use the low 32 bits of each view, but I'm leery of depending on an Arrow implementation detail like that.

We could iterate over the strings and sum their length, but that seems like overkill for a buffer sizing hint.

How about:

Use total_buffer_bytes_used() for now; this is just a capacity hint, so an estimate is okay

Add a comment for the short-strings case

Open an issue in Arrow to have them add a helper for this use-case, as it seems reasonably useful in general

My suggestion is similar to the impl at https://github.com/apache/arrow-rs/blob/2f40f78e4feae3aee261d9608cede9535e1429e0/arrow-array/src/array/byte_view_array.rs#L685. It just does not filter out the small strings and uses u128 for the sum before the final cast to usize.

The code in Arrow casts *v to u32, which takes the low 32 bits. Summing all 128 bit values and then taking the min of that value and usize::MAX does not seem to do the right thing, unless I'm misunderstanding completely.

In any case, I'd prefer to not depend on Arrow implementation details, if we can avoid it.

I filed apache/arrow-rs#9435 for this

You are right!

Jefffrey · 2026-02-19T02:37:33Z

I think this should be good to merge once the conflict is resolved; it would be nice indeed if we had a unified way of estimating (whether accurate or not) view size, whether upstream or in a common function in DataFusion but that can be a follow up

neilconway · 2026-02-19T02:46:15Z

@Jefffrey Great! I resolved the merge conflict.

Jefffrey · 2026-02-19T03:20:07Z

Thanks @neilconway, @alamb & @martin-g

github-actions bot added the functions Changes to functions implementation label Feb 12, 2026

Jefffrey approved these changes Feb 13, 2026

View reviewed changes

datafusion/functions/src/string/concat.rs Outdated Show resolved Hide resolved

neilconway mentioned this pull request Feb 13, 2026

[EPIC] Optimize performance for slow expressions apache/datafusion-comet#2986

Open

neilconway force-pushed the neilc/optimize-concat branch from 0eac046 to 992b569 Compare February 13, 2026 22:36

martin-g self-assigned this Feb 18, 2026

martin-g changed the title ~~perf: Optimize concat() UDF~~ perf: Optimize concat()/concat_ws() UDFs Feb 18, 2026

martin-g reviewed Feb 18, 2026

View reviewed changes

Add comment for total_buffer_bytes_used() undercount

88b70bf

neilconway mentioned this pull request Feb 18, 2026

GenericByteViewArray: support finding total length of all strings apache/arrow-rs#9435

Open

Merge remote-tracking branch 'origin/main' into neilc/optimize-concat

e50e725

Jefffrey added this pull request to the merge queue Feb 19, 2026

Merged via the queue into apache:main with commit 4f4e814 Feb 19, 2026
28 checks passed

neilconway deleted the neilc/optimize-concat branch February 19, 2026 12:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

perf: Optimize concat()/concat_ws() UDFs#20317

perf: Optimize concat()/concat_ws() UDFs#20317
Jefffrey merged 3 commits intoapache:mainfrom
neilconway:neilc/optimize-concat

neilconway commented Feb 12, 2026

Uh oh!

neilconway commented Feb 12, 2026

Uh oh!

Uh oh!

neilconway commented Feb 15, 2026

Uh oh!

martin-g Feb 18, 2026

Uh oh!

neilconway Feb 18, 2026

Uh oh!

martin-g Feb 18, 2026 •

edited

Loading

Uh oh!

neilconway Feb 18, 2026

Uh oh!

neilconway Feb 18, 2026

Uh oh!

martin-g Feb 18, 2026

Uh oh!

Jefffrey commented Feb 19, 2026

Uh oh!

neilconway commented Feb 19, 2026

Uh oh!

Uh oh!

Jefffrey commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

neilconway commented Feb 12, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

neilconway commented Feb 12, 2026

Uh oh!

Uh oh!

neilconway commented Feb 15, 2026

Uh oh!

martin-g Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

neilconway Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

martin-g Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neilconway Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

neilconway Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

martin-g Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Jefffrey commented Feb 19, 2026

Uh oh!

neilconway commented Feb 19, 2026

Uh oh!

Uh oh!

Jefffrey commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

martin-g Feb 18, 2026 •

edited

Loading